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DOWN SCALING OF IMAGES 

TECHNICAL FIELD 

The present invention relates to a method and a device for computing the 
Discrete Cosine Transfonn (DCT) for image and video transcoding and 
scalable video coding. 

BACKGROUND OF THE INVENTION AND PRIOR ART 
It is reasonable to expect that in the future a wide range of quality video 
services like High Definition TV (HDTV) will^ae available together with 
Standard Definition TV (SDTV), and video services of lower quality such as 
videophone and videoconference. Multimedia documents containing video 
will most probably not only be retrieved over computer networks, but also 
over telephone lines. Integrated Services Digital Network (ISDN), 
Asynchronous Transfer Mode (ATM), or even mobile networks. 

The transmission over several types of links or networks with different bit 
rates and varying traffic load will require an adaptation of the bit rate to the 
available channel capacity. The main constraint of the systems is that the 
decoding of any level below the one associated with the transmitted format 
should not need the complete decoding of the transmitted source. 

In order to maximise the integration of these various quality video services, a 
single coding scheme which can provide an unlimited range of video sen/ices 
is desirable. Such a coding scheme would enable users of different qualities 
to communicate with each other. For example, a subscriber to only a lower 
quality video service should be capable of decoding and reconstructing a 
digitally transmitted higher quality video signal, albeit at the lower quality 
service level to which he subscribes. Similarly, a higher quality service 
subscriber should be capable of decoding and reconstructing a digitally 
transmitted lower quality video signal although, of course, its subjective 
quality will be no better than that of the transmitted quality. 

The problem therefore is associated with the way in which video will be 
transmitted to subscribers with different requirements (picture quality, 
processing power, memory requirements, resolution, bandwidth, frame rate, 
etc.). The following points summarise the requirements: 

• satisfy users having different bandwidth requirements, 

• satisfy users having different computational power. 

• adapt frame rate, resolution and compression ratio to user preferences 
and available bandwidth, 

• adapt frame rate, resolution and compression ratio to network abilities, 

• short delay, and 

• conform with standards. If required. 




One solution to the problem of satisfying the different requirements of the 
receivers is the design of scalable bitstreams. In this form of scalability, there 
is usually no direct interaction between a transmitter and a receiver. Usually, 
the transmitter is able to make a bit stream which consists of various layers 
which can be used by receivers with different requirements in resolution, 
bandwidth, frame rate, memory or computational complexity. If new receivers 
are added which do not have the same requirements as the existing ones, 
then the transmitter has to be re-programmed to accommodate the 
requirements of the new receivers. Briefly, in bit stream scalability, the 
abilities of the decoders must be known in advance. 

A different solution to the problem is the use of transcoders. A transcoder \ 
accepts a received data stream encoded according to a first coding scheme 
and outputs an encoded data stream encoded according to a second coding \ 
scheme. If one had a decoder which operates according to a second coding 
scheme then such a transcoder would allow reception of the transmitted 
signal encoded according to the first coding scheme without modifying the 
original encoder. 

One situation that usually appears especially in multiparty conferences is that 
a particular receiver has a different bandwidth ability and/or a different 
computational requirements. For example, in a multipoint communication with 
participants connected through ISDN and Public Switched Telephone 
Network (PSTN), the bandwidth can vary from 28.8 kbits/s (PSTN) to more 
than 128 kbits/s (ISDN). Since video transmitted at as high bit rates as 128 
kbits/s can not be transferred over PSTN lines, video transcoding has to be 
implemented in the Multipoint Control Unit (MCU) or Gateway. 

This transcoding might has to implemenl^ a spatial resolution reduction of the 
video in order to fit into the bandwidth f\ a particular receiver. For example, 
an ISDN subscriber might be transmitdng video in Common Intermediate 
Format (CIF) (288x352 pixels), whil^a PSTN subscriber might be able to 
receive video only in a Quad Compnon Intermediate Format (QCIF) 
(144x176). Another example is v^en a particular receiver does not have the 
computational power to decpde/at a particular resolution and therefore a 
reduced resolution video has to be transmitted to that receiver. Additionally, 
transcoding of HDTV to SDTv requires a resolution reduction. 

For example, the transcoder could be used to convert a 128 kbit/s video 
signal in CIF format confomning to ITU-T standard H.261, from an ISDN video 
terminal for transmission to a 28.8 Kbit/s video signal in QCIF format over a 
telephone line using ITU-T standard H.263. 
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It should also be noted that many scalable video coding systems require both 
the use of 8x8 and 4x4 DCT. For example, in L.H. Kieu and K.N. Ngan, "Cell- 
loss concealment techniques for layered video codecs in an ATM network", 
IEEE Trans. On Image Processing, Vol. 3, No. 5, pp. 666-677, September 
1994, a scalable video coding system is described in which the base layer 
has lower resolution compared to the enhancement layer. In that system, an 
8x8 DCT is applied in each of the 8x8 blocks of the image and the 
enhancement layer is compressed using the 8x8 DCT. The base layer uses 
the 4x4 out of the 8x8 DCTs of each block of the enhancement layer and is 
compressed using only 4x4 DCTs. This however is not beneficial since a 4x4 
DCT usually results in reduced performance compared to the 8x8 DCT and it 
requires also that encoders and decoders have to be able to handle 4x4 
DCTs/IDCTs. 

The traditional method of downsampling an image consists of two steps, see 
J. Bao, H. Sun, T.C. Poon, "HDTV down conversion decoder", IEEE Trans. 
On Consumer Electronics, Vol. 42. No. 3, pp. 402-410. August 1996. First the 
image is filtered by an anti-aliasing low pass filter. The filtered image is 
downsampled by a desired factor in each dimension. For a DCT-based 
compressed image, the above method implies that the compressed image 
has to be recovered to the spatial domain by inverse DCT and then undergo 
the procedure of filtering and downsampling. If the image is to be compressed 
and transmitted again, this requires an extra forward DCT after the 
undersampling stage. This can be the case in which the undersampling takes 
place in a Multipoint Control Unit - MCU in order to satisfy the requirements 
and bandwidth of a particular receiver, or in scalable video coding schemes. 

In a different method, that works in the compressed domain, both the 
, operations of filtering and downsampling are combined in the DCT domain. 
This is done by cutting DCT coefficients of high frequencies and using the 
inverse DCT with a lower number of DCT coefficients in order to reconstruct 
the reduced resolution image. For example, one can use the 4x4 out of the 
8x8 and perform the IDCT of these coefficients in order to reduce the 
resolution by a factor of 2 in each dimension. This does not result in 
significant compression gains and additionally requires that receivers are 
able to handle 4x4 DCTs. 

Furthermore, this method results in significant amount of block edge effects 
and distortions, due to the poor approximations introduced by simply 
discarding higher order coefficients. The above method would be more useful 
if one had 16x16 DCT blocks and were keeping the low frequency 8x8 DCT 
coefficients in order to obtain the downsampled image. However, most image 
and video compression standard methods like JPEG, H.261, MPEG1, 
MPEG2 and H.263 segment the images into rectangular blocks of size 8x8 



pixels and appl|^|P OCT onto these blocks. Therefor^pRly 8x8 DCTs are 
available. A way to compute the 16x16 DCT coefficients is to apply Inverse 
DCT on each of the 8x8 blocks and reconstruct the image. Then the DCT on 
blocks of size 16x16 can be applied and the 8x8 out of the 16x16 DCTs 
coefficients of each block can be kept, if a resolution reduction by a factor of 
2 in each dimension is required. 

This, however, requires complete decoding (perform 8x8 IDCTs) and re- 
transforming by performing 16x16 DCTs (16x16 DCT hardware would be 
required). However, if one could compute the 8x8 out of the 16x16 DCT 
coefficients by using only 8x8 transformations, then this method would be 
faster and also would perform better than the one that uses the 4x4 out of the 
8x8. It would also mean that computation of DCTs of size 16x16 is avoided 
and reduced memory requirements are obtained. Furthermore, US A 5 1 07 
345 describes an adaptive DCT scheme used in coding. The scheme uses 
2x2, 4x4, 8x8 and 16x16 DCTs in'order to obtain a flexible bit rate which can 
be modified according to the available transmission capacity. Our scheme 
provides a fast computation to this adaptive scheme. 



SUMMARY 

It is an object of the present invention to provide a method and a device 
which overcomes the problems associated with the use of DCT of different 
sizes as outlined above. This object and others are obtained by a method 
and a devices for the computation of an N-point DCT using only transforms of 
size N/2 as set out in the claims. The present invention also provides a direct 
computational algorithm for obtaining the DCT coefficients of a signal block 
taken from two adjacent blocks, i.e. it can be used for directly obtaining the N 
point DCT of an original sequence from 2 N/2 DCTs, which represent the 
DCT coefficients of the first N/2 data points of the original sequence and the 
last N/2 data points of the original sequence, respectively. 

Furthennore, a method that can be used for decreasing the spatial resolution 
of the incoming video is also obtained. The method provides lower spatial 
resolution reconstructed video with good picture quality, less complexity and 
memory requirements. It can be applied to image and/or video transcoding 
from a certain resolution factor to a lower one, while in the compressed 
domain. It can also be applied in scalable video coding and in adaptive video 
coding schemes. The main advantage of the scheme is that it requires DCT 
algorithms of standard size (8x8 in the case of the existing video standards) 
and results in better performance compared to existing schemes. 



BRIEF DESCRIPTION OF THE DRAWINGS 
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The present invention will now be described by way of non-limiting examples 
and with reference to the accompanying drawings, in which: 

- Fig. 1 is a diagram illustrating a multipoint communication system. 

- Fig. 2 is a flow chart, which shows the different steps carried out when 
transcoding a GIF image to QCIF in the DCT domain. 

- Fig. 3 is a flow chart illustrating different steps carried out when transcoding 
a still image by reducing the resolution by a factor 2 in each dimension. 

- Fig. 4 is a general view of a video transcoder. 

- Fig. 5 is an illustration of the steps performed in the DCT domain when 
executing the algorithm as described herein. 

DESCRIPTION OF PREFERRED EMBODIMENTS 
In fig 1, a transmission system for digitised images is shown. Thus, in this 
example three users 101, 103 and 105 are connected to each other via an 
MCU 107. The users in this case have different capabilities. Users 101 and 
105 are connected via 128 kbit/s ISDN connections, while user 103 is 
connected via a 28.8 kbit/s PSTN connection. In a point-to-point 
communication, users 101 and 103 can also be connected through a 
gateway. 

In such a case, users 101 and 105 may transmit video signals in a CIF format 
to each other. However, if user 103 wants to receive the video signal 
transmitted between the users 101 and 105, he/she is unable to do so, due to 
the limited transmission capacity of his/her transmission line, unless some 
kind of bit reduction is performed in the MCU. 

One way of obtaining this bit reduction at the MCU is to extract the 4x4 low 
frequency coefficients of the 8x8 DCT coefficients of the incoming video from 
users 1 01 and 1 05 and to transmit only these to user 1 03 in order to 
reconstruct the incoming frames in QCIF format through appropriate scaling 
of the motion vectors. This will not be beneficial from a compression and 
quality point of view. Instead, it would be more beneficial if low frequency 8x8 
DCT coefficients were extracted out of 16x16 blocks of DCT coefficients. 
This can then be performed in the following manner without having to use 
DCTs/IDCTs other than 8x8 points. ' 

Let the DCT coefficients of 4 adjacent 8x8 blocks of the CIF image be stored 



point arrays (of DCT coefficients), where N=16 in the following examples. 
Each row k o1 Z consists of row k of block <D, and of row k of block <D^ (i=1 
and j=2 or i=3 and j=4). For each row k of Z, the problem now is to calculate 



in 2D arrays in the form Z = 




- where ( / = 1,2,3,4 ) are (y^y. 



the N point Dcij^en having the N/2 DCT points of d {i=1 andj=2 
or i=3 and j=4). 

In order to solve the problem of calculating the N point DCT from two N/2 
DCT sequences, the following method can be used. Suppose that the 

sequence x. , /=0.1 N-\ is present. Then consider the following sequences: 

y,^x,, /=0.1 iN/2)A , and r, = x.^^,,, i = 0,l,...,(7^/2)- 1. Also assume that 

N -l"" , and assume that hardware for the computation of the N/2-point 
DCT/IDCT is available in the MCU 107. In this specific case N=16, which 
today is the normal case for computing DCT/IDCT since N/2=8, and 8x8 
DCTs are mainly used in standard video coding schemes. 

The problem is to compute the DCT coefficients of by having the DCT 
coefficients of and . For downsampling by a factor of 2, in this case half 
of the DCT coefficients of (the low frequency coefficients) are needed. 

First some necessary definitions are given. The normalised DCT (DCT-II) of 
X, is given by the equation, see K.R. Rao and P. Yip, Discrete Cosine 
Transform: Algorithms, Advantages and Applications, Academic Press Inc., 
1990: 

^^-^'^f^^^^ ^ = 0,U...,Ar-l (1) 

and the inverse DCT (IDCT) is given by the equation: 

|T5^' ^ {2i + \)tar . ^, „ , /ox 



vi^here 



12 ^^^'^^^ (3) 
1 for k * 0 



Notice that = and £,^^, = 1 . 



The normalised DCT-IV of jc, is given by the equation, see the above cited 
book by K.R. Rao et al. 
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and the inverse DCT-IV (IDCT-IV) is given by: 

, = 0,, (5) 

Notice that the DCT-IV and the IDCT-IV are given by the same equation. 

The normalised DST-IV of x, is given by the equation, see the above cited 
bookby K.R. Raoetal. 

^* = -j7^Z^.s'n^^ . k = OX...,N-] (6) 

and the inverse DST-IV (IDST-IV) is given by: 

[Tii^ ^ . (2A: + l)(2/ + l);r (js 

''=v7^S^*'" — ^ — ' ' ^ ^ ^ 

Notice that the DST-IV and the IDST-IV are given by the same equation. 



It should be noted that the normalisation factors -Jl/'N that appear in both 
the fonward and inverse transforms could be merged as 2/ N and moved to 
either the forward or inverse transforms. In the following however the 
normalisation factor -Jl/ N will be kept in both the forward and the inverse 
transforrris. 

Furthermore, both the DST-IV and the DCT-IV can be computed through the 
DCT. In the above cited book by K.R. Rao et al. the software code for the 
computation of the DCT-IV and the DST-IV through the DCT is given. 

Suppose that the DCTs of and z, are denoted as and Z^. respectively 
for A = 0,l,...,(A^/2)-l. 



Two problems are addressed here: 

(a) the computation of the N-point DCT of x, by using only (N/2)-point 
transformations, and 



t 



(b) the computalW of the N-point DCT of x, when are known (i.e. 

one has the DCT coefficients of the N/2-point sequences and 2, ). 
Consider the even-indexed output of A'^, . 

It should be noted that variables / and n are used interchangeably in the 
following equations. 

From eq. (1), for k = 2k 

2 



Z(2n + l)ic;r ^ (2n + \)K7r 

X cos — : r-+ > X, cos 
2{N/2) „i7r,, " 



n=0 



Z(2n + l)#f;r 
V cos — ■ ^+ > x„ , „ cos 
2{N/2) to ^ 



\)K7t 2 4^, 



=i|i-[n-(-o*z.] 



2(A^ / 2) 

[i{N - \ - n) lycK 
2{N I 2) 

{2n + l)K5r 



cos- 



2{N I 2) 



Ar = 0,l (;^/2)-l. 



(8) 



where Z\ are the DCT-II coefficients of z'„=x^_,_„ for /;=0,l,...,(i\r/2)-l. 
Equation (8) denotes that the even-indexed DCT coefficients of x, can be 
computed by the DCT coefficients of and z, , i.e. the even indexed DCT 
coefficients of the N-element array can be obtained from the DCT coefficients 
of the two adjacent N/2 element arrays. 

Furthermore, let R^be the odd-indexed coefficients, i.e. /?» = A'^^,. Then by 
defining 



then 

~(N-^ 



(9) 



/ 2 ( 



(2w + l)(2A + l);r 5^' 



(2n + \){2k-\)7r\ 
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1 rr rT~ IZ' , (2/» + l);r (2/i-+l)A:;r 



s, \2\\N/2 



2 (2w + l)Ar;r 



2{N 12) 



or 



"ill 



{length-N/2 DCT-ll of r„} 



{9a) 



where 



(2M + l);r 
2N 



["FT. ^ (2w + !)/«■ 12 ^r.-r, — (2» + l)/;r 



^2cos 



(2w + l);r 
2N 



2 i-. ^ , (2/j + l)/;r 

J— =— Tf,(K -Z", )cos^^ 

\JV/2t^ ' ' ' 2{N 



2) 



^ (2/i + l);r 

2cos-^^ — 

2N 



= ^„2cos 



(2/? + l);r 
2N 



(9b) 



where 

g„ is a lehgth-N/2 IDCT of ( y,-Z", ). 



Hence R\ is calculated by means of a DCT-ll of r„. where r„ is computed as 
the IDCT-II of the differences r,- 2', multiplied by cosine factors. Both the 
DCT and the IDCT are of length-N/2. 

The odd-indexed outputs 7?^ of the length-N DCT of x„are calculated from 
equation (9) as 



(10) 



Due to the symmetry of the cosine function it is concluded that 



10 



R,^R,, W W (11a) 

and based on this and equation (9) 

For the computation of the even-indexed coefficients only N/2 additions are 
necessary. For the computation of the odd-indexed coefficients N/2+(N/2 - 1) 
additions, N/2 multiplications, one length-N/2 IDCT and one length-N/2 DCT 
are required. This results to a total of Mn multiplications and An additions 
according to the following formulae: 

Mn = (N/2) + 2 Mn/2 

(12a) 

An = (3N/2) - 1 + 2 An/2 

(t2b) 

where Mn/2 / An/2 is the number of multiplications / additions of a length-N/2 
DCT. 

Based on the initial values M2=1 , A2=2 the above equations become: 
Mn = (N/2) log2N 

(13a) 

An = (3N/2) logjN - N + 1 

(13b) 

The complexity is equal to that of a length-N fast DCT computation according 
■ to well known fast algorithms, such as the ones described in H. S. Hou: "A 
Fast Recursive Algorithm for Computing the Discrete Cosine Transform", 
IEEE Trans on ASSP, Vol. ASSP-35, pp. 1445-1461, Oct. 1987.. S. C. Chan 
and K. L Ho: "Direct Methods for Computing Discrete Sinusoidal Transform", 
lEE Proceedings, Vol. 137, Pt. F, No. 6, pp. 433-442, Dec. 1990 and C. W. 
Kok: "Fast Algorithm for Computing Discrete Cosine Transfonn", IEEE Trans 
on Signal Processing, Vol. 45, No. 3, pp. 757-760. Mar. 1997. 

If the multiplications by I/V2 are taken into account, then N-1 multiplications 
are needed in addition and 1 'shift right' for multiplying by 'A However, all 
these multiplications could be absorbed within the quantiser that follows the 
DCT stage. The computational complexity given above could be greatly 
reduced if the sparseness of the data and the weight matrices were taken into 
account. Notice that for a downsampling by a factor of two, the computational 
complexity is reduced even more, since only half of the coefficients in 
equations (7b) and (9a) need to be computed. Another way to calculate the 
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Odd-indexed DCT coefficients of X, is as follows. For * = 2Ar + 1 , eq. (1 ) 
becomes: 



_ IT (2/>l)(2y + l);r 



^ (2/ + l)(2* + l);r 
2^ X,. cos T-r. + 2- w/2 cos 



(2i + N + \)i2k + l)7r 



2N 



/T i;' (2/ -H)(2A:-H);r ^ { (21 + \)i2k + Dtt . „ _ , ;r j 



f- 1 



( 2/ + 1)(2A + l);r ^ ^_ j^,., ^. (2/ + 1)(2A: + l);r 



2N 



= ^(A'l,-(-l)*;r2,). A = 0,1 (A^/2)-l. 



(14) 



Notice that is the DCT-iV of y, and X2, is the DST-IV of z, . This 
-means that X^^^^ can be computed by N/2 point transformations. Since the 
DCT-IV and the DST-IV can be computed through the DCT, this concludes 
that A'j^,, can be computed by a N/2 point DCT. From equation (8), X^, can 
be computed by N/2 point DCTs and therefore an N-point DCT is not needed. 

Below the terms X\, and X2^ of equation (14) are further analysed. 



— 1 



= Z^.cos 



(2/ + 0(2;: + l);r 



1-0 



2N 

N 



InU r2~U (2i + ]){2k + \)7r ^ [2 ^ 



(2/ + l)pn- ^ 

cos ) 

' 2(A^/2) ^ 



k=0X...XN/2)-\ 



(15) 



where by definition 



# 



12 



Therefore Xl^ can be computed by an IDCT followed by a forward DCT-IV of size 
N/2 (and multiplied by J-j^ ). Notice that the cos(.) terms in eq. (15). can be pre- 
computed and stored. 

In a similar manner X2^ can be calculated as: 

t:' . (2/ -Kl)(2^-H);r 
X2, = 2--' «n Tn " 

/ 2 ^' ■ (2/-H)(2A:4-l);r I 2 V . (gj + QP^, ..q, .^,2) 
= feS^'" i^^ V^S"'^'"'^ 2(^/2) ^""^'^ 

(17) 

where by definition 

Therefore ;^2» , can be computed by an inverse DCT followed by a forward DST-IV 
of size N/2 (and multiplied by ). Notice that the cos(.) terms in eq. (17), can 

be pre-computed and stored. 

Notice that in equations (15) and (17). a fast algorithm can be used for the 
computation of the DST-IV and DCT-IV as the one described in H.-C. Chiang and 
J.-C. Liu. "A progressive stmcture for on-line computation of arbitrary length DCT- 
IV and DST-IV transfonns", \EEE Trans. On Circuits and Systems for Video 
Technology, Vol. 6, No. 6, pp. 692-695, Dec. 1996. 

, Alternatively, both the DCT-IV and the DST-IV can be computed through the DCT 
as explained in 2. Wang. "On computing the Discrete Fourier and Cosine 
Transforms", IEEE Trans. On Acoustics. Speech and Signal Processing, Vol. 
ASSP-33, No. 4, pp. 1 341 -1 344, October 1 985. 

Therefore, a separate DCT-IV or DST-IV module is not required. DCT and IDCT is 
used only. Furthermore, for N=16, a 16 point DCT is not required and the standard 
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,. 8 point DCT can be used. This further reduces the complexity of the circuits 

required. Notice also that the cascaded operations of IDCT and DCT-IV (eq. 1 5) as 
well as IDCT and DST-IV (eq. 17) , which are all of size N/2, can be replaced by a 
single N-point IDCT that can be used on a multiplexed basis, as described in N. R. 
Murthy and M. N. S. Swamy, "On a novel decomposition of the DCT and its 
applications", IEEE Trans. On Signal Processing, Vol. 41, No. 1. pp. 480-485, Jan. 
1993. 

This has certain advantages in a hardware implementation of the algorithm. These 
equations therefore imply that standard available DCT hardware can be used to 
compute the N-point DCT by having the DCT coefficients of the 2 adjacent blocks 
of N/2 points that constitute the N points. 

The computational complexity of the algorithm depends on the algorithm used for 
the computation of the DCT and IDCT. The computational complexity appears to be 
similar to the complexity of a scheme that implements two inverse DCTs of size N/2 
and a forward DCT of size N. However, such a scheme would require a N point 
DCT which is not advantageous, since it is supposed that N/2-point DCTs are 
available. Furthermore, the memory requirements are reduced in this scheme since 
an Nrpoint DCT is not needed. 

Notice that the above algorithms will compute all N DCT points. In practice this is 
not required for applications where image downsampling is performed. For 
example, for downsampling by a factor of 2 we have to keep the 8 out of every 16 

DCT points of . Therefore, k = 0,1 iN/4)-\ in equations 8,9,10,12. Pruning 

DCT algorithms as in A.N. Skodras. "Fast Discrete Cosine Transform Pruning", 
IEEE Trans. On Signal Processing, Vol. 42, No. 7, pp. 1833-1837. July 1994, can 
be used in that case to compute only the required number of DCT points. 

The equations given above can be further analysed and simplified. The detailed 
analysis follows below based on equation (14) and separate analysis of X\, and 
X2^ . Parts of equations derived in the previous paragraph are repeated for 
clarification purposes. 



From equation (14) 



14 



y^ V V .n«12L±i)2^lil£ - 



i=0 



2^ cos 77^^ ( — 



1=0 



IN 



Z^gf^CPS ) 



2{NI2) 



-I 



(19) 



2, (2/ + W-k + l);r 

= 2-c°s~ 

1=0 



2N 



\nI2 



4 



4 



^ ^ (2/ + l);>;r ^ „ (2 



2(N/2) 



By defining the sequences and y2 as: 



= y2^^ 



for p = 0,l N/4 



(20) 



equation (19) becomes 



^-1 



cos- 



(2; + \)(2k + l);r 
2Ar 



2 

Ar/2 



4 



— t 



7 e Y\ cos- — — I- > / 2 , cos 



2iN/2) 



— 1 



=z 



cos- 



(2/ + l)(2Ar + \)7r 



1=0 



2;S^ 



1 

V2 



(2/ + l)/7;r , 2 



in 



(2i + l)(2p + l);r 



4(A^/4) 



Equation (21) can be subdivided further into: 
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4-, (2i + l)(2* + l);r 
Xl^ = 2^ cos— 

r=0 



2N 



1 

J2 



(2/ + l)p;r I 2 (2/ + 1)(2/7 + l);r 



(22) 



2: (2/ + l)(2A: + l);r 
2^ cos— 



2A^ 



1 

>/2 



(2/ + l)p;r 



2 4r 



(2; + \)(2p + !)«• 



By defining 



4) 



. / = 0,l,...,(A^/4)-l 



and 



(23) 



(24) 



it is seen that y\, is the IDCT of Y\ p of N/4 points and yl, is the IDCT-IV of 



Y2p of N/4 points. 



Notice that when Y\ and/or Y2 are zero, then y\, and/or yO:., do not need to be 
computed. This will speed-up the calculation of equation (22). 



Further analysis of the second term of equation (22) gives: 
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# 



2:^°^ ^ 



1 

V2 



(2/ 



-H)(2p + 1)«-] 



COS" 

" 4(N/A) 



(2/ + 1 + yX2* + 1)«- 
5] cos 



<>0 



2;/ 



1 

V2 



-y-T-' (2/ + 1 + y)p^ 



I 2 * (2i + \ + -)(2p + l)7r 

• J I'z - cos 77Tni\ 



7-' (2/ + l+^)(2Jt + l);r 



2^cos- 



i>0 



1 )rM-'.(-n'n cosSiiii£i.,P^(-ir'i'2,siJ^"'^e£±])£ 



4{N/4) 



(25) 



By defining 



(26) 



^ ' V^/4^^ ' 4(A^/4) 



(27) 



y\] is recognised as the IDCT of sequence {-\VY\, of N/4 points and yl] is 
recognised as the IDST-IV of sequence (-D'^'n^. of N/4 points. 

Notice that when n and/or Y2 are zero, then yl, and/or yZ, do not need to be 
computed. This will speed-up the calculation of equation (25). 
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From equations (15), (16). (18) and (19). it is seen that 



i^' (2/ + l)(2/r + lVr 
2-« 



cos- 



2N 



f' (2i + l+.^)(2Ar + l);r _ 
~ 2 (yli+y2,) 



ioO 



2iV 



k=0.l (N/2)-\ 



(28) 



In similar manner, the second temn of equation (14). can be analysed as 
follows: 

|:' . (2 i + \)(2k + l)7r 

isO 



^' . (2/-H)(2A:-H);r M ^ ^ cos^^ 



l«0 



1)££ 
2) 



. (2/ + l)(2* + lVr 




4 



cos- 



(2/-Hl)(2p-H);r| 



^ (2/ + l)p;r ^ _ 



^ . (2i+\) (2k + \)7r 



H" I 2 ^ (2/ + l)p;r (2 ^ 



cos- 



(2i- + n(2/7 + l)g 



4(Ar/4) 



(29) 



where 



for p = 0,1 AT/ 4 



(30) 



Z2,=Z,^. 

Equation (29) can be further subdivided to: 
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* . (2i -fl)(2A:-H);r 
= L«n 




-1 



aFTI?/'^^' "^^1(7^" VITTT^-/'' 4(NfA) 



(2/ + l)pff . / 2 ^ 



(2i-H)(2pH-l);r 



(31) 



2, sin 



27\r 



[rf n"!^' ^, (2/-H)p;r rr^' , (2/4-l)(2p^-l)?r 



By defining 



zl 



(32) 



,2;=i:L:Jz2,casi^i:yH^ (;,/4)-l . (33 

It is seen that zl] is the IDCT of sequence Zip of NM points and z2] is the 
IDCT -IV of sequence Z2p of N/4 points. 

Notice that when Zip and/or Z2^ are zero, then zi; and/or r2; do not need 
be computed. This speeds-up the calculation of equation (31). 

Further analysis of the second term of equation (31) gives: 
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4^ . (2/-H)(2^ + l)g 
2 Sin 



1 



f' (2i + l + Y)(2A: + l);r 

i>0 



2N 



P2-T-' (2/+1+Y)/"' n't,' , 



cos- 



N 

(2i + 1 + y)(2/7+1)«- 



4(i\^/4) 



AT. 



T-« (2/ + l + Y)(2A: + l);r 
2^ sin — 

•■0 



2N 



1 

V2 



sin- 



(2/ + l)(2£+l)£| 



By defining 



(34) 



-1 - I— f— (-n'Zi cos^ i = ox 



,(N/4)-\ 



p=0 



(35) 



(36) 



It is seen that z\] is the IDCT of sequence {-\YZ\p of N/4 points and zl] is 
the IDST -IV of sequence (-l)'^'Z2p of N/4 points. Notice that when Zip 
and/or Zip are zero, then z\] and/or zl] do not need to be computed. This 
speeds-up the calculation of equation (34). 

From equations (31). (32), (33). (34). (35) and (36) it is seen that: 
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N "W IS/ 

7-' (2/Tl + —)i2k + l)7r 



X2, 



(2/ -I- \){2k + l);r 
IN 



(zl'i + r2i) + T^sin 




(zi;"+z2';) 



(37) 



Therefore, the odd indexed DCT coefficients can be computed from equation 



Notice that in eq. (8b) and (38), the values of k will be k = 0,1,..., (A^/ 4)- 1. 
for a downsampling by a factor of 2. 

In Fig. 5, an illustration of the steps performed in the DCT domain when 
executing the algorithm according to the equations (8) and (9). Thus, first the 
two sequences of length N/2, (N=8 in this example), Y and Z are input at 501. 
Next, the second sequence Z is reversed in a step 503 and a sequence Z' is 
produced. 

The upper four lines in Fig. 5 show how the even indexed coefficients are 
calculated according to equation 8. The coefficients of Y are added with the 
appropriate coefficients of Z' in a step 505 and are multiplied by I/V2 at 513 
in order^o produce the even indexed coefficients of X at 517. 

The lower four lines show how the odd indexed DCT coefficients of X are 
produced. First sequence Y-Z' (see equation 9b) is produced in the step 505. 
and an inverse DCT transfomri (IDCT) is applied to this sequence at step 507. 
The resulting coefficients are multiplied by appropriate factors, i.e. 

2005 ^^^"^^^^ where n goes from 0 to N/2 - 1 , i.e. in this example from 0 to 3, 
IN 

at step 509 which produce sequence r„ of equation 9b. Then, at a step 51 1 , a 
DCT is performed on this sequence and the resulting coefficients are 
multiplied by I/V2 at step 513, as above. Notice that because of equation 
11b, the first coefficient after the DCT transformation also has to be multiplied 
by V2. this is also performed at step 513. After this, at step 515, equation 10 is 
performed. Thus, in the step 515 the fifth coefficient is subtracted from the 
sixth, the new sixth from the seventh and the new seventh is subtracted from 
the eighth coefficient. 



= ItM^^ -i-D'Xl,), k = 0,1 (Ar/2)- I. 



(38) 
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The coefficients of the sequence X can now be output in a step 517 in the 
order, from top to bottom in Fig. 5, X(0) X(2) X(4) X(6) X(1 ) X(3) X(5) X(7). 

Thus, for example, an image of QCIF format can be derived from an image in 
a GIF format without having to use any other transforms than 8x8 DCTs, if the 
GIF image were processed by using DGT applied in 8x8 blocks, by using the 
following method illustrated in the flow chart in fig. 2. 

First in block 201 four 8x8 adjacent DCT-point an-ays of a GIF format image 
are loaded into a memory as an array of size 16x16 points. Next, the 16-point 
DGT for each row of the 16x16 array is calculated in a block 203 using the 
equations (8) and (9) for the even and odd coefficients, respectively. Then, 
the coefficients of that row are stored in a memory 205. 

Thereupon it is checked in a block 207 if the current row was the last in the 
16x16 array. If this is not the case the row number is incremented in a block 
209 and the calculations in block 203 are repeated for the next row of the 
16x16 array. If, on the other hand, the 16 DGT coefficients for the last row 
have been calculated and stored in the memory, a block 21 1 fetches the 
16x16 DGT coefficients now stored in the memory 205 and loads these into 
the block 211. 

The procedure then continues in a similar manner for the computation of the 
columns, i.e. the method is applied in a column manner to the result that has 
been obtained from the row-computation. 

Hence, in a block 213 the DGT for the first column of the array loaded into the 
block 211 is calculated using the equations (8) and (9) for the even and odd 
coefficients, respectively, and the coefficients for that column are stored in a 
block 2 15." Thereupon, it is checked in a block 217 if the DGT for the column 
cun-ently calculated is the last that is required. If this is not the case the 
column number is incremented by one in a block 219 for the next column of 
the 16x16 an-ay and the calculations in block 213 are repeated for the next 
column of the 16x16 array. 

If, on the other hand, the 16 DGT coefficients for the last column have been 
calculated and stored in the memory block 215, a block 221 fetches the 
16x16 DGT coefficients stored in the memory 215 and loads these into the 
block 221. 

Next, in the block 221 , the 8x8 low frequency DGT coefficients are extracted 
from the 16x16 DGT coefficients. The 8x8 DGT coefficients are then output in 
a block 223. 



If only the Mxl^^rows and K columns) DCT coefficialrare required then 
the computation of the rows remains the same but then for each row, only the 
first K coefficients are computed. Then, during the computation of the 
columns, the first K columns are processed and for each of these columns 
the low frequency M coefficients are calculated. This method is useful for 
undersampling by a different factor in each dimension (for example 
undersampling by 2 in dimension x and by 4 in dimension y). Thereafter the 
MxK low frequency coefficients of the in this manner obtained 16x16-point 
DCT are extracted and transmitted. The method can also be applied in a 
similar manner to compute arbitrary number of DCT coefficients for each 
row/column. 



The method can be used in a number of different applications. As an 
example, suppose that an image compression scheme like JPEG, uses 8x8 
DCTs. Suppose that the compressed image is received. An undersampling 
(downsampling) of the image by a factor of 2 in each dimension would require 
keeping the low frequency 8x8 DCT coefficients out of a block of 16x16 DCT 
coefficients. These 16x16 DCT blocks can be computed with the method 
described above by having the 4(8x8) DCT coefficients that constitute the 
16x16 block. 



Notice that in the Row-Column (RC) computation, a further speed-up can be 
obtained if the coefficients of a certain row/column are zero, which normally is 
the case for high frequency DCT coefficients. In practice, in video coding 
about 80% of DCT coefficients are zero. i.e. the ones corresponding to high 
frequencies. Therefore, faster computation can be achieved by taking this 
infomnation into account. For example, if all DCT coefficients of the two sub- 
rows of the fourth row of Z are zero, there is no reason to try to compute the 
DCT coefficients for that row. Another case can for example be if the DCT 
coefficients of row 3 of are zero, all computations involving these 
coefficients can then be skipped. 

Notice that the scheme can be applied in a recursive manner. For example, if 
QCIF, CIF and SCIF are required then 8x8 DCTs are used for the SCIF. The 
CIF is obtained by calculating the 8x8 DCTs of the 16x16 block that consists 
of 4(8x8) DCT coefficients of the SCIF. Then the QCIF can be obtained by 
keeping only the 4x4 out of the 8x8 DCT coefficients of each 8x8 block of the 
CIF or by again calculating the 8x8 DCTs of the 16x16 block that consists of 
4(8x8) DCT coefficients of the CIF. This has interesting applications in 
scalable image/video coding schemes and in image/video transcoding with 
spatial resolution reduction schemes. 
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Alternatively, from each 8x8 blocks of DCT coefficients, one can keep only 
the 4x4 low frequency coefficients. Then from 4(4x4) blocks of DCT 
coefficients one can compute an 8x8 block of DCT coefficients. 

The method as described herein has a number of advantages. Thus, 
standard DCT/IDCT hardware can be used, since there is no requirement of 
using 16x16 DCT, when 8x8 DCT/IDCT is available. 

There is no requirement for fully decoding, filtering and downsampling in the 
spatial domain and fully encoding by DCT again. There are less memory 
requirements, since computation of a 16x16 DCT requires much more 
memory and data transfers compared to the 8x8 case. 

The method can be used for undersampling by various factors. For example, 
if 8x8 DCTs are used and an undersampling by a factor of 4 in each 
dimension is desired, then only the low frequency 2x2 DCT coefficients out of 
the 8x8 are to be kept, which is not advantageous from a compression 
efficiency point of view. However, with the method as described herein one 
can calculate the 16x16 DCT coefficients out of the available 4(8x8) DCTs 
and keep only the 4x4 of them, or compute them directly. This is more 
efficient than by keeping the 2x2 out of the 4x4 and will result in better image 
quality. One can also compute an 8x8 block of DCT coefficients by 4(4x4) 
blocks of DCT coefficients. Each of the 4x4 blocks of DCT coefficients can be 
part of an 8x8 block of DCT coefficients. 

The method results in fast computation when many of the DCT coefficients of 
the 8x8 blocks are zero, since computation of rows and columns DCTs/IDCTs 
can be avoided for that row/column. 

Further, in L.H. Kieu and K.N. Ngan, "Cell-loss concealment techniques for 
layered video codecs in an ATM network", IEEE Trans. On Image Processing, 
Vol. 3, No. 5, pp. 666-677, September 1994, a frequency scalable video 
coding scheme is described. The scheme uses 8x8 DCTs for the upper 
layers. The base layer is coded using 4x4 DCTs. The low frequency 4x4 DCT 
coefficients of each of the 8x8 blocks of the upper layer are used at the base 
layer. 

With the DCT algorithms as described herein, the frequency scalable video 
codec described in the above cited paper by L.H. Kieu et al. can be modified 
as follows: 

- Compute the low-frequency 8x8 DCT coefficients by applying the proposed 
algorithm in 4(8x8) blocks of DCT coefficients of the upper layer. Then code 
the base layer by standard techniques using 8x8 DCT algorithms. This as an 
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efficient techniqwror all frequency scalable systems. T^^method has the 
following advantages in this case: 

The video coding is applied in 8x8 blocks. This results in better coding 
efficiency compared to using 4x4 blocks. The motion vectors have to be 
computed for 8x8 blocks. Therefore less motion vectors need to be 
transmitted (or stored) compared to using 4x4 blocks. Also, variable length 
coding schemes are well studied for 8x8 DCT coefficients compared to the 
4x4 case. 



Notice that an alternative method would be to keep the 4x4 low frequency 
DCT coefficients of each 8x8 DCT block of the upper layer and by having 
4(4x4) of these blocks to compute the 8x8 DCT of these 4x4 blocks. Such an 
approach is illustrated in fig. 3. 

Thus, in fig. 3 a flow chart illustrating different steps carried out when 
transcoding a still image by reducing the resolution by a factor 2 in each 
dimension, is shown. First in a block 301 an image compressed in the DCT 
domain is received. The received image is then entropy decoded in a block 
303, for example by a Huffman decoder or an arithmetic decoder. 

Thereupon, in a block 305, 8x8 blocks of DCT coefficients of the decoded full 
size image are obtained, and in a block 307 the low-frequency 4x4 DCT 
coefficients from each 8x8 block are extracted. 8x8 DCTs are then obtained 
in a block 309 by means of applying the row-column method described above 
for four adjacent 4x4 blocks of low-frequency coefficients. 

Next, each 8x8 blocks resulting from the row-column method in the block 309 
is entropy coded in a block 31 1 and then transmitted or stored in a block 313. 
Notice that the DCT coefficients might have to re-quantized before entropy 
coding in order to achieve a specific compression factor. 

In fig. 4 a general view of a video transcoder employing the teachings of the 
method described above, is shown. The transcoder receives an incoming 
bitstream of a compressed video signal. The received compressed video 
signal is decoded in block 401 wherein the motion vectors of the 
decompressed video signal are extracted. The motion vectors are fed to a 
block 403 in which a proper motion vector scaling in accordance with the 
transcoding performed by the transcoder is executed, as for example in this 
case a division by 2 is performed. The image information not related to the 
motion vectors are fed to a block 405 from block 401. 

In block 405 DCT blocks of size 8x8 are obtained. The DCT blocks of size 
8x8 are then fed to a block 407 in which four adjacent 8x8 DCT blocks are 
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combined to one, undersampled, 8x8 DCT block according to the method 
described above. The new. undersampled, 8x8 DCT blocks are then 
available in a block 409. A block 41 1 then encodes the 8x8 DCT blocks in the 
block 409, which also can involve a re-quantization of the DCT coefficients, 
together with the scaled motion vectors from block 403 and forms a combined 
compressed output video signal. 

Furthermore, in US A 5,107,345 and US A 5,452,104 an adaptive block size 
image compression method and system is proposed. For a block size of 
16x16 pixels, the system calculates DCTs for the 16x16 blocks and the 8x8, 
4x4 and 2x2 blocks that constitute the 16x16 block. The algorithm as 
described herein can be used to compute the NxN block by having the 4{W2 
X N/2) DCT coefficients. For example, by having the DCT coefficients of each 
2x2 block one can compute the DCT coefficients for the 4x4 blocks. By 
having the DCT coefficients for each 4x4 block one can compute the DCT 
coefficients for the 8x8 blocks, etc. The DCT algorithm can therefore be used 
for the efficient coding in the schemes described in US A 5,107,345 and US A 
5.452,104. 



